Goto

Collaborating Authors

 exemplar guided active learning


Exemplar Guided Active Learning

Neural Information Processing Systems

We consider the problem of wisely using a limited budget to label a small subset of a large unlabeled dataset. For example, consider the NLP problem of word sense disambiguation. For any word, we have a set of candidate labels from a knowledge base, but the label set is not necessarily representative of what occurs in the data: there may exist labels in the knowledge base that very rarely occur in the corpus because the sense is rare in modern English; and conversely there may exist true labels that do not exist in our knowledge base. Our aim is to obtain a classifier that performs as well as possible on examples of each "common class" that occurs with frequency above a given threshold in the unlabeled set while annotating as few examples as possible from "rare classes" whose labels occur with less than this frequency. The challenge is that we are not informed which labels are common and which are rare, and the true label distribution may exhibit extreme skew.


Review for NeurIPS paper: Exemplar Guided Active Learning

Neural Information Processing Systems

Why is the sampling strategy switched to uncertainty sampling once an example is collected? Is it because after the classifier seeing one example in a rare class, it could start to give high uncertainty to the rare class? If that is the case, I do not understand why we cannot use the initial example (in WordNet?), which we assume to be available at the beginning, to train the classifier and directly use uncertainty sampling at the beginning. Minor questions: 1. Did you try to use cosine distance rather than L2 distance in the guided search? It might improve the performance a little?


Review for NeurIPS paper: Exemplar Guided Active Learning

Neural Information Processing Systems

The paper proposed to select unlabelled training examples based-on the embedding distance between the given exemplar and the query data. A pretrained BERT model is used to compute the embedding for the training examples. The problem formulation of selecting balanced labels in a highly skewed training set and the complexity bound is appreciated by all the reviewers. The general consensus is that the paper adds an interesting contribution to active learning methods applied to word sense disambiguation. The current version of the paper would be greatly strengthened by including more datasets.


Exemplar Guided Active Learning

Neural Information Processing Systems

We consider the problem of wisely using a limited budget to label a small subset of a large unlabeled dataset. For example, consider the NLP problem of word sense disambiguation. For any word, we have a set of candidate labels from a knowledge base, but the label set is not necessarily representative of what occurs in the data: there may exist labels in the knowledge base that very rarely occur in the corpus because the sense is rare in modern English; and conversely there may exist true labels that do not exist in our knowledge base. Our aim is to obtain a classifier that performs as well as possible on examples of each "common class" that occurs with frequency above a given threshold in the unlabeled set while annotating as few examples as possible from "rare classes" whose labels occur with less than this frequency. The challenge is that we are not informed which labels are common and which are rare, and the true label distribution may exhibit extreme skew.